Skip to content

[feature] additional optimizations#16

Open
iceseer wants to merge 11 commits intoparitytech:mainfrom
iceseer:feature/optimization
Open

[feature] additional optimizations#16
iceseer wants to merge 11 commits intoparitytech:mainfrom
iceseer:feature/optimization

Conversation

@iceseer
Copy link

@iceseer iceseer commented Oct 20, 2025

Features

[feature] parallel calculations
[feature] branch prediction
[feature] prefetches
[feature] arena allocator

Additional options (features=)

parallel - enables most of the optimizations
arena - enables arena-allocator (at the moment, enabling this option is not recommended.)

The number of threads for parallel computing can be set using the RAYON_NUM_THREADS environment variable. By default, it is equal to the number of logical cores.

RAYON_NUM_THREADS=4 cargo build --features parallel,simd

Results

master

construct/PoV: 131072 Chunks: 1023
                        time:   [734.44 µs 735.22 µs 735.74 µs]
                        thrpt:  [169.90 MiB/s 170.02 MiB/s 170.20 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [614.16 µs 616.57 µs 618.48 µs]
                        thrpt:  [202.11 MiB/s 202.74 MiB/s 203.53 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [4.1591 ms 4.1618 ms 4.1641 ms]
                        thrpt:  [240.15 MiB/s 240.28 MiB/s 240.44 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [4.0365 ms 4.0396 ms 4.0422 ms]
                        thrpt:  [247.39 MiB/s 247.55 MiB/s 247.74 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [20.881 ms 20.907 ms 20.943 ms]
                        thrpt:  [238.74 MiB/s 239.16 MiB/s 239.45 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [22.498 ms 22.520 ms 22.535 ms]
                        thrpt:  [221.88 MiB/s 222.03 MiB/s 222.24 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [942.23 µs 944.27 µs 945.90 µs]
                        thrpt:  [132.15 MiB/s 132.38 MiB/s 132.66 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [910.97 µs 912.27 µs 913.86 µs]
                        thrpt:  [136.78 MiB/s 137.02 MiB/s 137.22 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.3444 ms 5.3599 ms 5.3695 ms]
                        thrpt:  [186.24 MiB/s 186.57 MiB/s 187.11 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.3483 ms 5.3549 ms 5.3616 ms]
                        thrpt:  [186.51 MiB/s 186.75 MiB/s 186.97 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [28.427 ms 28.457 ms 28.476 ms]
                        thrpt:  [175.59 MiB/s 175.70 MiB/s 175.89 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [28.557 ms 28.575 ms 28.593 ms]
                        thrpt:  [174.87 MiB/s 174.98 MiB/s 175.09 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [3.8896 µs 3.8971 µs 3.9025 µs]
                        thrpt:  [31.280 GiB/s 31.324 GiB/s 31.384 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [3.6804 µs 3.6950 µs 3.7231 µs]
                        thrpt:  [32.787 GiB/s 33.037 GiB/s 33.168 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [40.176 µs 40.293 µs 40.367 µs]
                        thrpt:  [24.192 GiB/s 24.237 GiB/s 24.307 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [40.882 µs 41.266 µs 41.478 µs]
                        thrpt:  [23.544 GiB/s 23.665 GiB/s 23.887 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [202.30 µs 203.64 µs 204.52 µs]
                        thrpt:  [23.875 GiB/s 23.978 GiB/s 24.136 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [200.78 µs 201.15 µs 201.55 µs]
                        thrpt:  [24.226 GiB/s 24.275 GiB/s 24.320 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [747.82 µs 749.57 µs 751.27 µs]
                        thrpt:  [166.38 MiB/s 166.76 MiB/s 167.15 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [622.28 µs 623.07 µs 623.77 µs]
                        thrpt:  [200.39 MiB/s 200.62 MiB/s 200.87 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [3.3391 ms 3.3513 ms 3.3614 ms]
                        thrpt:  [297.50 MiB/s 298.40 MiB/s 299.48 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [3.2344 ms 3.2420 ms 3.2498 ms]
                        thrpt:  [307.71 MiB/s 308.46 MiB/s 309.18 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [15.130 ms 15.152 ms 15.166 ms]
                        thrpt:  [329.68 MiB/s 330.00 MiB/s 330.47 MiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [15.034 ms 15.066 ms 15.090 ms]
                        thrpt:  [331.34 MiB/s 331.88 MiB/s 332.57 MiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.8991 µs 1.9013 µs 1.9046 µs]
                        thrpt:  [64.093 GiB/s 64.204 GiB/s 64.276 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.7756 µs 1.7800 µs 1.7852 µs]
                        thrpt:  [68.380 GiB/s 68.580 GiB/s 68.750 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [4.2870 µs 4.3015 µs 4.3138 µs]
                        thrpt:  [226.38 GiB/s 227.03 GiB/s 227.80 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [4.1736 µs 4.1893 µs 4.2050 µs]
                        thrpt:  [232.24 GiB/s 233.11 GiB/s 233.99 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [15.244 µs 15.272 µs 15.298 µs]
                        thrpt:  [319.18 GiB/s 319.73 GiB/s 320.31 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [15.069 µs 15.114 µs 15.170 µs]
                        thrpt:  [321.88 GiB/s 323.07 GiB/s 324.02 GiB/s]

feature/optimized "simd,parallel"

construct/PoV: 131072 Chunks: 1023
                        time:   [510.44 µs 518.70 µs 523.39 µs]
                        thrpt:  [238.83 MiB/s 240.99 MiB/s 244.89 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [491.78 µs 499.87 µs 505.45 µs]
                        thrpt:  [247.30 MiB/s 250.06 MiB/s 254.18 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [3.8102 ms 4.0263 ms 4.2182 ms]
                        thrpt:  [237.07 MiB/s 248.37 MiB/s 262.45 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [3.3056 ms 3.4247 ms 3.5951 ms]
                        thrpt:  [278.16 MiB/s 291.99 MiB/s 302.52 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [11.770 ms 12.076 ms 12.341 ms]
                        thrpt:  [405.16 MiB/s 414.05 MiB/s 424.81 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [10.573 ms 10.729 ms 10.889 ms]
                        thrpt:  [459.17 MiB/s 466.01 MiB/s 472.90 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [1.1142 ms 1.1174 ms 1.1207 ms]
                        thrpt:  [111.54 MiB/s 111.86 MiB/s 112.19 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [1.0434 ms 1.0461 ms 1.0485 ms]
                        thrpt:  [119.22 MiB/s 119.50 MiB/s 119.81 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.7017 ms 5.7171 ms 5.7359 ms]
                        thrpt:  [174.34 MiB/s 174.92 MiB/s 175.39 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.9081 ms 5.9204 ms 5.9383 ms]
                        thrpt:  [168.40 MiB/s 168.91 MiB/s 169.26 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [30.485 ms 30.520 ms 30.559 ms]
                        thrpt:  [163.62 MiB/s 163.83 MiB/s 164.01 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [30.348 ms 30.410 ms 30.463 ms]
                        thrpt:  [164.13 MiB/s 164.42 MiB/s 164.76 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [4.4633 µs 4.4745 µs 4.4908 µs]
                        thrpt:  [27.183 GiB/s 27.281 GiB/s 27.350 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [4.2831 µs 4.2918 µs 4.3010 µs]
                        thrpt:  [28.382 GiB/s 28.442 GiB/s 28.500 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [44.509 µs 44.593 µs 44.663 µs]
                        thrpt:  [21.865 GiB/s 21.899 GiB/s 21.941 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [43.186 µs 43.294 µs 43.375 µs]
                        thrpt:  [22.515 GiB/s 22.557 GiB/s 22.613 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [225.61 µs 226.08 µs 226.63 µs]
                        thrpt:  [21.546 GiB/s 21.598 GiB/s 21.643 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [213.87 µs 214.43 µs 214.87 µs]
                        thrpt:  [22.724 GiB/s 22.771 GiB/s 22.831 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [449.20 µs 452.96 µs 455.20 µs]
                        thrpt:  [274.60 MiB/s 275.97 MiB/s 278.27 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [452.59 µs 460.99 µs 469.91 µs]
                        thrpt:  [266.01 MiB/s 271.15 MiB/s 276.19 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [1.1936 ms 1.2028 ms 1.2104 ms]
                        thrpt:  [826.17 MiB/s 831.42 MiB/s 837.79 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [1.1964 ms 1.2112 ms 1.2219 ms]
                        thrpt:  [818.39 MiB/s 825.64 MiB/s 835.82 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [3.9726 ms 4.0436 ms 4.1057 ms]
                        thrpt:  [1.1893 GiB/s 1.2075 GiB/s 1.2291 GiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [3.8041 ms 3.8699 ms 3.9361 ms]
                        thrpt:  [1.2405 GiB/s 1.2617 GiB/s 1.2836 GiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.6454 µs 1.6549 µs 1.6701 µs]
                        thrpt:  [73.093 GiB/s 73.762 GiB/s 74.191 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.4936 µs 1.5016 µs 1.5135 µs]
                        thrpt:  [80.653 GiB/s 81.294 GiB/s 81.730 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [3.8267 µs 3.8389 µs 3.8525 µs]
                        thrpt:  [253.48 GiB/s 254.38 GiB/s 255.20 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [3.6486 µs 3.6626 µs 3.6818 µs]
                        thrpt:  [265.24 GiB/s 266.63 GiB/s 267.66 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [13.767 µs 13.829 µs 13.892 µs]
                        thrpt:  [351.49 GiB/s 353.08 GiB/s 354.68 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [13.548 µs 13.617 µs 13.692 µs]
                        thrpt:  [356.62 GiB/s 358.57 GiB/s 360.41 GiB/s]

feature/optimized "simd"

construct/PoV: 131072 Chunks: 1023
                        time:   [617.80 µs 618.19 µs 618.52 µs]
                        thrpt:  [202.09 MiB/s 202.20 MiB/s 202.33 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [509.04 µs 509.35 µs 509.51 µs]
                        thrpt:  [245.34 MiB/s 245.41 MiB/s 245.56 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [3.7070 ms 3.7095 ms 3.7114 ms]
                        thrpt:  [269.44 MiB/s 269.58 MiB/s 269.76 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [3.5996 ms 3.6012 ms 3.6029 ms]
                        thrpt:  [277.56 MiB/s 277.69 MiB/s 277.81 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [18.909 ms 18.966 ms 19.050 ms]
                        thrpt:  [262.46 MiB/s 263.62 MiB/s 264.43 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [20.120 ms 20.139 ms 20.154 ms]
                        thrpt:  [248.09 MiB/s 248.27 MiB/s 248.51 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [941.99 µs 944.66 µs 947.65 µs]
                        thrpt:  [131.90 MiB/s 132.32 MiB/s 132.70 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [933.19 µs 937.18 µs 940.97 µs]
                        thrpt:  [132.84 MiB/s 133.38 MiB/s 133.95 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.3582 ms 5.3729 ms 5.3848 ms]
                        thrpt:  [185.71 MiB/s 186.12 MiB/s 186.63 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.1137 ms 5.1234 ms 5.1289 ms]
                        thrpt:  [194.97 MiB/s 195.18 MiB/s 195.55 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [28.670 ms 28.712 ms 28.743 ms]
                        thrpt:  [173.96 MiB/s 174.14 MiB/s 174.40 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [29.408 ms 29.455 ms 29.499 ms]
                        thrpt:  [169.50 MiB/s 169.75 MiB/s 170.02 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [4.4606 µs 4.4706 µs 4.4823 µs]
                        thrpt:  [27.234 GiB/s 27.305 GiB/s 27.366 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [3.5982 µs 3.6523 µs 3.7379 µs]
                        thrpt:  [32.658 GiB/s 33.423 GiB/s 33.925 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [40.327 µs 40.892 µs 41.366 µs]
                        thrpt:  [23.608 GiB/s 23.881 GiB/s 24.216 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [44.721 µs 45.411 µs 45.861 µs]
                        thrpt:  [21.294 GiB/s 21.505 GiB/s 21.837 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [213.77 µs 214.14 µs 214.76 µs]
                        thrpt:  [22.736 GiB/s 22.802 GiB/s 22.842 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [204.88 µs 207.04 µs 209.81 µs]
                        thrpt:  [23.272 GiB/s 23.584 GiB/s 23.832 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [574.87 µs 579.48 µs 587.38 µs]
                        thrpt:  [212.81 MiB/s 215.71 MiB/s 217.44 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [485.66 µs 486.89 µs 488.34 µs]
                        thrpt:  [255.97 MiB/s 256.73 MiB/s 257.38 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [2.6842 ms 2.6933 ms 2.7004 ms]
                        thrpt:  [370.31 MiB/s 371.30 MiB/s 372.55 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [2.6215 ms 2.6251 ms 2.6272 ms]
                        thrpt:  [380.63 MiB/s 380.94 MiB/s 381.46 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [12.834 ms 12.862 ms 12.887 ms]
                        thrpt:  [388.00 MiB/s 388.73 MiB/s 389.58 MiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [12.780 ms 12.800 ms 12.824 ms]
                        thrpt:  [389.88 MiB/s 390.64 MiB/s 391.23 MiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.3788 µs 1.3822 µs 1.3851 µs]
                        thrpt:  [88.128 GiB/s 88.314 GiB/s 88.532 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.3032 µs 1.3047 µs 1.3065 µs]
                        thrpt:  [93.431 GiB/s 93.559 GiB/s 93.672 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [3.2892 µs 3.3222 µs 3.3598 µs]
                        thrpt:  [290.66 GiB/s 293.95 GiB/s 296.90 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [3.1637 µs 3.1658 µs 3.1686 µs]
                        thrpt:  [308.20 GiB/s 308.47 GiB/s 308.68 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [12.013 µs 12.021 µs 12.029 µs]
                        thrpt:  [405.91 GiB/s 406.18 GiB/s 406.46 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [11.938 µs 11.949 µs 11.959 µs]
                        thrpt:  [408.29 GiB/s 408.64 GiB/s 409.02 GiB/s]

Comparison

b-const
b-rec-reg
b-rec-sys
b-merk
b-ver

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants